geometric feature
Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers
Deformable shapes provide important and complex geometric features of objects presented in images. However, such information is oftentimes missing or underutilized as implicit knowledge in many image analysis tasks. This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification. We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations; and (ii) gains increased model interpretability by allowing direct access to the underlying geometric features of image data. In particular, we develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations characterized by diffeomorphic transformations within each class. In contrast to previous approaches using pre-extracted shapes, our model provides a more fundamental approach by naturally learning the most relevant shape features jointly with an image classifier. We demonstrate the effectiveness of our method on both simulated 2D images and real 3D brain magnetic resonance (MR) images. Experimental results show that our model substantially improves the image classification accuracy with an additional benefit of increased model interpretability. Our code is publicly available at https://github.com/jw4hv/Geo-SIC.
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
Shao, Yongxin, Tan, Aihong, Wang, Binrui, Jin, Yinlian, Guan, Licong, Liao, Peng
Lidar SLAM plays a significant role in mobile robot navigation and high-definition map construction. However, existing methods often face a trade-off between localization accuracy and system robustness in scenarios with a high proportion of dynamic objects, point cloud distortion, and unstructured environments. To address this issue, we propose a neural descriptors-based adaptive noise filtering strategy for SLAM, named ADA-DPM, which improves the performance of localization and mapping tasks through three key technical innovations. Firstly, to tackle dynamic object interference, we design the Dynamic Segmentation Head to predict and filter out dynamic feature points, eliminating the ego-motion interference caused by dynamic objects. Secondly, to mitigate the impact of noise and unstructured feature points, we propose the Global Importance Scoring Head that adaptively selects high-contribution feature points while suppressing the influence of noise and unstructured feature points. Moreover, we introduce the Cross-Layer Graph Convolution Module (GLI-GCN) to construct multi-scale neighborhood graphs, fusing local structural information across different scales and improving the discriminative power of overlapping features. Finally, experimental validations on multiple public datasets confirm the effectiveness of ADA-DPM.
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
Kadry, Karim, Goraya, Shoaib, Manicka, Ajay, Abdelwahed, Abdalla, Chutisilp, Naravich, Nezami, Farhad, Edelman, Elazer
Generative models of 3D cardiovascular anatomy can synthesize informative structures for clinical research and medical device evaluation, but face a trade-off between geometric controllability and realism. We propose CardioComposer: a programmable, inference-time framework for generating multi-class anatomical label maps based on interpretable ellipsoidal primitives. These primitives represent geometric attributes such as the size, shape, and position of discrete substructures. We specifically develop differentiable measurement functions based on voxel-wise geometric moments, enabling loss-based gradient guidance during diffusion model sampling. We demonstrate that these losses can constrain individual geometric attributes in a disentangled manner and provide compositional control over multiple substructures. Finally, we show that our method is compatible with a wide array of anatomical systems containing non-convex substructures, spanning cardiac, vascular, and skeletal organs.
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- North America > United States > Virginia (0.05)
- Europe > United Kingdom > North Sea > Southern North Sea (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
From Propagation to Prediction: Point-level Uncertainty Evaluation of MLS Point Clouds under Limited Ground Truth
Xu, Ziyang, Wysocki, Olaf, Holst, Christoph
Evaluating uncertainty is critical for reliable use of Mobile Laser Scanning (MLS) point clouds in many high-precision applications such as Scan-to-BIM, deformation analysis, and 3D modeling. However, obtaining the ground truth (GT) for evaluation is often costly and infeasible in many real-world applications. To reduce this long-standing reliance on GT in uncertainty evaluation research, this study presents a learning-based framework for MLS point clouds that integrates optimal neighborhood estimation with geometric feature extraction. Experiments on a real-world dataset show that the proposed framework is feasible and the XGBoost model delivers fully comparable accuracy to Random Forest while achieving substantially higher efficiency (about 3 times faster), providing initial evidence that geometric features can be used to predict point-level uncertainty quantified by the C2C distance. In summary, this study shows that MLS point clouds' uncertainty is learnable, offering a novel learning-based viewpoint towards uncertainty evaluation research.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- (2 more...)
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
Ni, Zehao, He, Yonghao, Qian, Lingfeng, Mao, Jilei, Fu, Fa, Sui, Wei, Su, Hu, Peng, Junran, Wang, Zhipeng, He, Bin
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Point-level Uncertainty Evaluation of Mobile Laser Scanning Point Clouds
Xu, Ziyang, Wysocki, Olaf, Holst, Christoph
Y et, despite this progress, the point clouds acquired by MLS systems operating in real-world environments inevitably contain uncertainty arising from various error sources during acquisition and processing. Although MLS systems have advanced rapidly in both data collection and post-processing, research on uncertainty evaluation has received comparatively less attention and remains underdeveloped (Xu et al., 2025b). From a user's perspective, the quality of point clouds from MLS systems is a critical concern. As the foundational input for many downstream tasks, inadequate assessment of MLS point clouds' quality can easily impact high-precision applications such as navigation and change analysis. This will not only undermine reliability but also result in substantial waste of time and resources, which is unacceptable in real-world applications. There is a clear need for automated and reliable solutions for uncertainty evaluation. In MLS systems, four main categories of error sources contribute to uncertainty: instrumental errors, atmospheric errors, object-and geometry-related errors, and trajectory estimation errors (Habib et al., 2009, Schenk, 2001). Considering the characteristics of these error sources, existing uncertainty evaluation methods can be broadly divided into two categories: forward modeling and backward modeling (Shi et al., 2021). The core idea of forward modeling is grounded in variance-covariance propagation, which involves detailed theoretical analysis of MLS system errors.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Asia > Taiwan (0.04)
DAMM-LOAM: Degeneracy Aware Multi-Metric LiDAR Odometry and Mapping
Chandna, Nishant, Kaushal, Akshat
LiDAR Simultaneous Localization and Mapping (SLAM) systems are essential for enabling precise navigation and environmental reconstruction across various applications. Although current point-to-plane ICP algorithms perform effec- tively in structured, feature-rich environments, they struggle in scenarios with sparse features, repetitive geometric structures, and high-frequency motion. This leads to degeneracy in 6- DOF pose estimation. Most state-of-the-art algorithms address these challenges by incorporating additional sensing modalities, but LiDAR-only solutions continue to face limitations under such conditions. To address these issues, we propose a novel Degeneracy-Aware Multi-Metric LiDAR Odometry and Map- ping (DAMM-LOAM) module. Our system improves mapping accuracy through point cloud classification based on surface normals and neighborhood analysis. Points are classified into ground, walls, roof, edges, and non-planar points, enabling accurate correspondences. A Degeneracy-based weighted least squares-based ICP algorithm is then applied for accurate odom- etry estimation. Additionally, a Scan Context based back-end is implemented to support robust loop closures. DAMM-LOAM demonstrates significant improvements in odometry accuracy, especially in indoor environments such as long corridors
Two-stream network-driven vision-based tactile sensor for object feature extraction and fusion perception
Huang, Muxing, Chen, Zibin, Xu, Weiliang, Li, Zilan, Zhou, Yuanzhi, Zhou, Guoyuan, Chen, Wenjing, Li, Xinming
Tactile perception is crucial for embodied intelligent robots to recognize objects. Vision-based tactile sensors extract object physical attributes multidimensionally using high spatial resolution; however, this process generates abundant redundant information. Furthermore, single-dimensional extraction, lacking effective fusion, fails to fully characterize object attributes. These challenges hinder the improvement of recognition accuracy. To address this issue, this study introduces a two-stream network feature extraction and fusion perception strategy for vision-based tactile systems. This strategy employs a distributed approach to extract internal and external object features. It obtains depth map information through three-dimensional reconstruction while simultaneously acquiring hardness information by measuring contact force data. After extracting features with a convolutional neural network (CNN), weighted fusion is applied to create a more informative and effective feature representation. In standard tests on objects of varying shapes and hardness, the force prediction error is 0.06 N (within a 12 N range). Hardness recognition accuracy reaches 98.0%, and shape recognition accuracy reaches 93.75%. With fusion algorithms, object recognition accuracy in actual grasping scenarios exceeds 98.5%. Focused on object physical attributes perception, this method enhances the artificial tactile system ability to transition from perception to cognition, enabling its use in embodied perception applications.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
GPI-Net: Gestalt-Guided Parallel Interaction Network via Orthogonal Geometric Consistency for Robust Point Cloud Registration
Gu, Weikang, Han, Mingyue, Xue, Li, Dong, Heng, Yang, Changcai, Chen, Riqing, Wei, Lifang
The accurate identification of high-quality correspondences is a prerequisite task in feature-based point cloud registration. However, it is extremely challenging to handle the fusion of local and global features due to feature redundancy and complex spatial relationships. Given that Gestalt principles provide key advantages in analyzing local and global relationships, we propose a novel Gestalt-guided Parallel Interaction Network via orthogonal geometric consistency (GPI-Net) in this paper. It utilizes Gestalt principles to facilitate complementary communication between local and global information. Specifically, we introduce an orthogonal integration strategy to optimally reduce redundant information and generate a more compact global structure for high-quality correspondences. To capture geometric features in correspondences, we leverage a Gestalt Feature Attention (GFA) block through a hybrid utilization of self-attention and cross-attention mechanisms. Furthermore, to facilitate the integration of local detail information into the global structure, we design an innovative Dual-path Multi-Granularity parallel interaction aggregation (DMG) block to promote information exchange across different granularities. Extensive experiments on various challenging tasks demonstrate the superior performance of our proposed GPI-Net in comparison to existing methods. The code will be released at https://github.com/gwk429/GPI-Net.
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)